103 research outputs found

    Cued Speech: A visual communication mode for the Deaf society

    Get PDF
    International audienceCued Speech is a visual mode of communication that uses handshapes and placements in combination with the mouth movements of speech to make the phonemes of a spoken language look different from each other and clearly understandable to deaf individuals. The aim of Cued Speech is to overcome the problems of lip reading and thus enable deaf persons to wholly understand spoken language. In this study, automatic phoneme recognition in Cued Speech for French based on hidden Markov model (HMMs) is introduced. The phoneme correct for a normal-hearing cuer was 82.9%, and for a deaf 81.5%. The results also showed, that creating cuer-independent HMMs should not face any specific difficulties, other than those occured in audio speech recognition

    GMM Mapping Of Visual Features of Cued Speech From Speech Spectral Features

    No full text
    International audienceIn this paper, we present a statistical method based on GMM modeling to map the acoustic speech spectral features to visual features of Cued Speech in the regression criterion of Minimum Mean-Square Error (MMSE) in a low signal level which is innovative and different with the classic text-to-visual approach. Two different training methods for GMM, namely Expectation-Maximization (EM) approach and supervised training method were discussed respectively. In comparison with the GMM based mapping modeling we first present the results with the use of a Multiple-Linear Regression (MLR) model also at the low signal level and study the limitation of the approach. The experimental results demonstrate that the GMM based mapping method can significantly improve the mapping performance compared with the MLR mapping model especially in the sense of the weak linear correlation between the target and the predictor such as the hand positions of Cued Speech and the acoustic speech spectral features

    Extraction automatique de contour de lèvre à partir du modèle CLNF

    No full text
    International audienceDans cet article nous proposons une nouvelle solution pour extraire le contour interne des lèvres d'un locuteur sans utiliser d'artifices. La méthode s'appuie sur un algorithme récent d'extraction du contour de visage développé en vision par ordinateur, CLNF pour Constrained Local Neural Field. Cet algorithme fournit en particulier 8 points caractéristiques délimitant le contour interne des lèvres. Appliqué directement à nos données audio-visuelles du locuteur, le CLNF donne de très bons résultats dans environ 70% des cas. Des erreurs subsistent cependant pour le reste des cas. Nous proposons des solutions pour estimer un contour raisonnable des lèvres à partir des points fournis par CLNF utilisant l'interpolation par spline permettant de corriger ses erreurs et d'extraire correctement les paramètres labiaux classiques. Les évaluations sur une base de données de 179 images confirment les performances de notre algorithme. ABSTRACT Automatic lip contour extraction using CLNF model. In this paper a new approach to extract the inner contour of the lips of a speaker without using artifices is proposed. The method is based on a recent face contour extraction algorithm developed in computer vision. This algorithm, which is called Constrained Local Neural Field (CLNF), provides 8 characteristic points (landmarks) defining the inner contour of the lips. Applied directly to our audiovisual data of the speaker, CLNF gives very satisfactory results in about 70% of cases. However, errors exist for the remaining cases. We offer solutions for estimating a reasonable inner lip contour from the landmarks provided by CLNF based on spline to correct its bad behaviors and to extract the suitable labial parameters A, B and S. The evaluations on a 179 image database confirm performance of our algorithm. MOTS-CLES : modèle CLNF, spline, contour des lèvres, paramètres labiaux, parole visuelle

    Cued Speech Automatic Recognition in Normal Hearing and Deaf Subjects

    Get PDF
    International audienceThis article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs)

    GMM Mapping Of Visual Features of Cued Speech From Speech Spectral Features

    No full text
    International audienceIn this paper, we present a statistical method based on GMM modeling to map the acoustic speech spectral features to visual features of Cued Speech in the regression criterion of Minimum Mean-Square Error (MMSE) in a low signal level which is innovative and different with the classic text-to-visual approach. Two different training methods for GMM, namely Expectation-Maximization (EM) approach and supervised training method were discussed respectively. In comparison with the GMM based mapping modeling we first present the results with the use of a Multiple-Linear Regression (MLR) model also at the low signal level and study the limitation of the approach. The experimental results demonstrate that the GMM based mapping method can significantly improve the mapping performance compared with the MLR mapping model especially in the sense of the weak linear correlation between the target and the predictor such as the hand positions of Cued Speech and the acoustic speech spectral features

    Characterization of Cued Speech Vowels from The inner LiP contour.

    Get PDF
    4 pagesInternational audienceCued Speech (CS) is a manual code that complements lip-reading to enhance speech perception from visual input. The phonetic translation of CS gestures needs to combine the manual CS information with information from the lips, taking into account the desynchronization delay (Attina et al. [1], Aboutabit et al. [2]) between these two flows of information. This paper focuses on the analysis of the lip flow for vowels in French Cued Speech. The vocalic lip targets are defined automatically at the instant of minimum velocity of the inner lip contour area parameter, constrained by the corresponding acoustic labeling. We discuss in particular the possibility of discriminating the vowels with geometric lip parameters using the values at the instant of vocalic targets when associated to a Cued Speech hand position

    Physical modeling of bilabial plosives production

    No full text
    International audienceThe context of this study is the physical modeling of speech production. The first step of our approach is to realize in-vivo measurements during the production of the vowel-consonant-vowel sequence /apa/. This measurements concerns intra-oral pressure, acoustic pressure radiated at the lips and labial parameters (aperture and width of the lips) derived from a high-speed video recording of the subject face. In a second time, theoretical models from speech production literature are under investigation to predict the air flow trough the lips. A model is validated by comparing his predictions with results obtained from measurements on a replica of phonatory system. Then, the same experimental set-up is used to introduce an aerodynamic model of supraglottal cavity expansion. Finally, we achieve numerical simulations of a vowel-bilabial plosive-vowel utterance, by using these models. Simulation results highlight the influence of the cheeks expansion during the production of bilabial plosives

    Adaptation de la production labiale d'un participant sourd et classification : le cas des voyelles en contexte du code LPC.

    Get PDF
    International audienceThe phonetic translation of Cued Speech (CS) gestures needs to mix the manual CS information together with the lips, taking into account the desynchronization delay (Attina et al. [2], Aboutabit et al. [7]) between these two flows of information. This contribution focuses on the lip flow modeling in the case of French vowels. Previously, classification models have been developed for a professional normal hearing CS speaker (Aboutabit et al., [7]). These models are used as a reference. Now, we process the case of a deaf CS speaker and discuss the possibilities of classification. The best performance (92,8%) is obtained with the adaptation of the deaf data to the reference models.Dans un système de communication entre des personnes normo entendantes et des personnes malentendantes, la transcription phonétique du code LPC nécessite de fusionner l'information issue des gestes de main et de lèvres. Cette contribution est centrée sur le traitement du flux labial dans le cas des voyelles. Des modèles de classification ont été développé pour un participant normo-entendant pratiquant le LPC. (Aboutabit et al., [7]). Ces modèles sont utilisés dans cette contribution comme une référence pour étudier les possibilités de classification des voyelles produites par un codeur LPC sourd. La meilleure performance (92,8%) est obtenue avec une adaptation des données "sourd" au modèles de référence

    The shadow of a doubt? Evidence for perceptuo-motor linkage during auditory and audiovisual close-shadowing

    Get PDF
    One classical argument in favor of a functional role of the motor system in speech perception comes from the close shadowing task in which a subject has to identify and to repeat as quickly as possible an auditory speech stimulus. The fact that close shadowing can occur very rapidly and much faster than manual identification of the speech target is taken to suggest that perceptually-induced speech representations are already shaped in a motor-compatible format. Another argument is provided by audiovisual interactions often interpreted as referring to a multisensory-motor framework. In this study, we attempted to combine these two paradigms by testing whether the visual modality could speed motor response in a close-shadowing task. To this aim, both oral and manual responses were evaluated during the perception of auditory and audio-visual speech stimuli, clear or embedded in white noise. Overall, oral responses were faster than manual ones, but it also appeared that they were less accurate in noise, which suggests that motor representations evoked by the speech input could be rough at a first processing stage. In the presence of acoustic noise, the audiovisual modality led to both faster and more accurate responses than the auditory modality. No interaction was however observed between modality and response. Altogether, these results are interpreted within a two-stage sensory-motor framework, in which the auditory and visual streams are integrated together and with internally generated motor representations before a final decision may be available
    • …
    corecore